Revisiting Success in Music Streaming: A Data-Driven Predictive Approach

 

Juan D Montoro-Pons & Manuel Cuadrado-García

Universitat de València

María Luisa Palma-Martos

Universidad de Sevilla

Outline

  1. Descriptive analysis of the dataset
  2. Unstructured data: networks (collaborations) and DTMs (mapping artists on tags)
    • Centrality metrics from collaboration graph
    • Similarity measures and genre agreement/disagreement (DTM)
  3. Predictive models: statistical learning models
  4. Interpretability of predictive models: shapley values
  5. Todo: estimation of structural parameters (DoubleML)

Motivation

 

Cultural Economics meets machine learning

  • Incorporate flexible predictive models for learning estructural (causal) parameters
  • Use of metrics derived from unstructured data (e.g., text or networks) to improve (predictive) accuracy/performance of economic/econometric models
  • Use of a purely empirical approach in the process of model selection and validation

Goals

Empirically related goals:

  • Investigate the predictive performance of alternative statistical learning models
  • Enlarge a well-known dataset using unstructured data from different sources
  • Incorporate user-related online practices to predict success
  • Interpret the contribution of the predictors into the outcome of the different models (hence allowing to generate hypothesis)

Ultimate goal is to identify what features have an impact on success in the streaming economy

The dataset

 

  • The primary dataset is retrieved from Spotify’s global weekly chart, covering the period from 29/09/2013 to 23/01/2025.

  • The sampling unit is a track (song)

  • Using webscraping and the Spotify API we collect information about a track’s success (peak position on charts, weeks at peak position, maximum weekly streams, total streams, and a popularity index) and a set of track and artist features including information about genres, type of album, release date, artist(s) popularity and followers, whether the track is a collaboration, markets in which the album is present, and audio features of the track to mention some. Features can be classified as

    • Track-specific
    • Album-related
    • Artist(s)-related (e.g., online tags from LastFM and MusicBrainz)
  • The dataset includes information on 4153 tracks by 1670 unique artists.

Descriptive analysis

Describing the dataset: tracks

 

The dataset includes information on 4153 tracks. Some statistics:

 

variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
weeks 0 1.000 18.745 33.810 1.000 1.000 6.000 22.000 407.000 ▇▁▁▁▁
top_ten 3680 0.114 8.951 8.917 1.000 2.000 6.000 14.000 64.000 ▇▂▁▁▁
streams 0 1.000 155.566 337.316 0.174 8.262 32.860 146.228 4604.759 ▇▁▁▁▁
peek_position 0 1.000 80.513 59.874 1.000 27.000 69.000 127.000 240.000 ▇▅▃▃▁
track_popularity 0 1.000 54.469 23.105 0.000 50.000 60.000 69.000 90.000 ▂▁▃▇▃
loudness 0 1.000 -6.305 2.491 -34.475 -7.497 -5.899 -4.663 1.509 ▁▁▁▇▇
danceability 0 1.000 0.681 0.142 0.150 0.590 0.695 0.786 0.985 ▁▂▅▇▃
energy 0 1.000 0.642 0.166 0.028 0.537 0.657 0.766 0.989 ▁▂▆▇▃
valence 0 1.000 0.484 0.222 0.032 0.313 0.481 0.656 0.976 ▃▇▇▆▃
speechiness 0 1.000 0.126 0.118 0.023 0.045 0.074 0.172 0.966 ▇▂▁▁▁
duration_s 0 1.000 208.258 49.196 35.240 178.107 203.641 230.953 613.027 ▁▇▁▁▁

Describing the dataset: artists and collaborations

 

  • The dataset includes information on 1670 unique artists.

  • Of all the tracks, 45% are collaborations between artists while 55% are solo tracks

  • Artist roles is split between:

    • solo (34% of occurrences)
    • lead (27%)
    • feature (39%).
  • Most frequent collaborations are between two artists: mean, median and Q1 of number of collaborators are 2.89, 2 and 3 respectively.

artist_name tracks collab_ratio
Drake 172 61%
Bad Bunny 110 57.3%
Taylor Swift 98 13.3%
Travis Scott 84 73.8%
Future 73 76.7%
Ariana Grande 66 40.9%
21 Savage 63 76.2%
The Weeknd 61 44.3%
Ed Sheeran 56 50%
Lil Baby 56 78.6%

Describing the dataset: more on collaborations

 

  • Spotify API produces an index of popularity for artists. Furthermore it provides the number of followers in the platform.

  • For each collaboration we compute:

    • The joint popularity
    • The popularity_ratio of each artist
    • The joint number of followers
    • The followers_ratio of each artist

Asymmetric collaborations

 

measure Role: feature Role: lead
% of artists that are more popular 0.46 0.54
% of artists that have more followers 0.39 0.55
Average popularity ratio 0.38 0.45
Average followers ratio 0.34 0.51

 

measure Collab FALSE Collab TRUE
popularity_track 55.612538 53.281826
weeks_in_lists 19.599478 17.106014
top_ten 8.599222 9.169291
peek 80.065738 82.141791
times_peek 3.188034 3.694323
streams 163.564835 139.716212

Network of collaborations

 

label degree eigen closeness betweeness
3TVXtAsR1Inumwj472S9r4 Drake 132 1.0000000 0.0003401 31930.107
0Y5tJX1MQlPlqiwlOH1tJY Travis Scott 109 0.7049464 0.0003357 17987.294
4q3ewBCX7sLwd24euuV69X Bad Bunny 93 0.0559605 0.0003233 35985.381
1vyhD5VmyZ7KMfW5gqLgo5 J Balvin 84 0.0500565 0.0003630 53044.032
2R21vXR83lH98kGeO99Y66 Anuel AA 81 0.0346626 0.0003212 20480.496
1RyvyyTE3xzB2ZywiAwp0i Future 79 0.7645781 0.0003185 14613.223
50co4Is1HCEo8bhOyUWKpn Young Thug 78 0.6576292 0.0003368 11283.802
1i8SpTcr7yvPOmcqrbnVXY Ozuna 76 0.0321060 0.0003227 14830.585
77ziqFxp5gaInVrF2lj4ht Sech 73 0.0294024 0.0002987 3370.446
1URnnhqYAYcrqrcwql10ft 21 Savage 67 0.7061541 0.0003232 8718.683
5f7VJjfbwm532GiveGC0ZK Lil Baby 66 0.4302981 0.0003224 13987.667
1pQWsZQehhS4wavwh7Fnxd Lenny Tavárez 62 0.0211659 0.0002734 1332.630
0KPX4Ucy9dk82uj4GpKesn Dalex 61 0.0206827 0.0002710 1275.171
1SupJlEpv7RS2tPNRaHViT Nicky Jam 61 0.0256472 0.0003006 6835.123
329e4yvIujISKGKz1BZZbO Farruko 61 0.0247795 0.0003042 10499.564

Asymmetric collaborations

Predicting success with classifiers

Interpretability of black-box models

Shapley values

  • Shapley values are a method from cooperative game theory that allows us to measure each feature’s individual contribution to a model’s prediction.Think of the model’s prediction as a “reward” that needs to be fairly distributed among “players” (the features). By calculating Shapley values, we determine how much each feature contributed to the prediction by seeing how the prediction changes when we add or remove that feature from various subsets of features.

  • To calculate Shapley values, we use a value function representing the model’s prediction based on any subset of features. This function lets us measure how much the prediction changes when a particular feature is added to a subset, capturing the marginal contribution of each feature.

  • Because the model’s prediction depends on all features—even those not in the subset we’re focusing on—we use marginalization to account for the remaining features by averaging their possible values. This lets us isolate the specific impact of the features in our subset.

  • The Shapley value for a feature is computed by averaging its marginal contributions across all possible subsets of features. This ensures that each feature’s contribution is fairly represented. However, calculating Shapley values for all subsets becomes computationally intense as the number of features increases since there are exponentially many subsets to consider.

  • To simplify this, we use Monte Carlo sampling to approximate Shapley values by randomly sampling subsets of features rather than evaluating them all. This method involves comparing the model’s predictions on random subsets with and without a specific feature and then averaging these differences across samples of records to estimate the feature’s contribution.

  • When estimating Shapley values through sampling, it’s important to evaluate how close these approximations are to the true values. Hoeffding’s inequality helps us by providing a statistical guarantee: it bounds the probability that the sampled-based Shapley values will deviate from the true Shapley values by more than a specified amount.

\[P\left(|\hat{\phi_j} - \phi_j | \right) \leq 2 \exp\left( -\frac{2 K \epsilon ^2}{(b-a)^2}\right)\]

with \(\hat{\phi_j}\) and \(\phi_j\) predicted and actual Shapley value, \(\epsilon\) the acceptable error margin, \(K\) the number of samples and \(b\) and \(a\) represent the range within which the model’s output varies for a single prediction. This shows that as we increase \(K\) the estimate becomes more reliable.

  • By applying Hoeffding’s inequality, we can establish that, for a sufficiently large number of samples \(K\), the probability that our Shapley value estimate deviates from the actual value by more than a small error margin \(\epsilon\) becomes very small.

  • KernelExplainer estimates Shapley values by sampling subsets and comparing predictions with and without each feature. Although KernelExplainer is flexible, it assumes feature independence, which can lead to slight biases when features are correlated: it uses marginal sampling rather than conditional sampling and consequently, the Shapley values calculated with KernelExplainer may be biased by its assumption that features are independent. Given our experiment’s natural grouping of features, Owen values may have been a more suitable choice from the start, as they can account for feature interactions.

  • Like Shapley values, Owen values aim to fairly distribute the model’s prediction among features. However, instead of evaluating features individually, Owen values allow us to create coalitions of features that act together.

  • To calculate Owen values:

    • We first form coalitions based on related features.
    • Then, we treat each coalition as a single “player” in the cooperative game, calculating the Shapley value for each coalition.
    • Finally, within each coalition, we distribute the coalition’s Shapley value among individual features based on their individual contributions within the coalition.
  • This two-step process (calculating Shapley values for coalitions, then distributing within each coalition) ensures that Owen values accurately reflect both individual and collective contributions, especially when features are interdependent.

  • In machine learning explainability, choosing between Shapley and Owen values is a strategic decision. Shapley values excel in analyzing independent contributions, while Owen values are better suited for capturing interactions within feature groups. This distinction is crucial when dependencies exist, as Owen values provide a more accurate reflection of joint influences.

  • When features are correlated, the use of KernelExplainer from the SHAP library can lead to biased results, since this method assumes all features are independent. In cases where features naturally form groups, Owen values offer a better alternative by accurately capturing feature interactions through grouped coalitions. Combining Shapley and Owen values is beneficial in complex models with independent and interdependent features.

Estimation of structural parameters

References